Hadoop 之HDFS部署测试

Posted by Jackson on 2017-08-13

1.hadoop

Apache Hadoop 地址https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
Cloudera Hadoop 地址http://archive.cloudera.com/cdh5/cdh/5/
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2.tar.gz

当前版本的Hadoop出现问题时,可以到changes.log里面查看高版本是否将此补丁修复
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2-changes.log

2.hadoop安装部署

2.1 创建hadoop用户

1
2
3
useradd hadoop
[root@bigdata01 ~]# id hadoop
uid=1001(hadoop) gid=1002(hadoop) groups=1002(hadoop)

2.2 切换到hadoop用户

1
2
[root@bigdata01 ~]# su - hadoop
Last login: Mon Dec 2 12:25:32 CST 2019 on pts/0

2.3 创建文件夹

1
2
3
4
5
6
7
8
9
10
11
[hadoop@bigdata01 ~]$ mkdir app software sourcecode log tmp data lib
[hadoop@bigdata01 ~]$ ll
total 0
drwxrwxr-x 3 hadoop hadoop 50 Dec 1 22:17 app
drwxrwxr-x 2 hadoop hadoop 6 Dec 1 22:10 data
drwxrwxr-x 2 hadoop hadoop 6 Dec 1 22:10 lib
drwxrwxr-x 2 hadoop hadoop 6 Dec 1 22:10 log
drwxrwxr-x 2 hadoop hadoop 43 Dec 1 22:14 software
drwxrwxr-x 2 hadoop hadoop 6 Dec 1 22:10 sourcecode
drwxrwxr-x 2 hadoop hadoop 22 Dec 2 12:36 tmp
[hadoop@bigdata01 ~]$

2.4 上传压缩包到software目录解压到app目录

1
2
3
[hadoop@bigdata01 software]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/
[hadoop@bigdata01 app]$ ll
drwxr-xr-x 14 hadoop hadoop 241 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2

2.5 做软连接

1
2
3
4
[hadoop@bigdata01 app]$ ln -s hadoop-2.6.0-cdh5.16.2/ hadoop
[hadoop@bigdata01 app]$ ll
lrwxrwxrwx 1 hadoop hadoop 23 Dec 1 22:17 hadoop -> hadoop-2.6.0-cdh5.16.2/
drwxr-xr-x 14 hadoop hadoop 241 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2

2.6 检查jdk

1
2
[hadoop@bigdata01 app]$ which java
/usr/java/jdk1.8.0_121/bin/java

2.7 配置环境变量

1
2
3
[hadoop@bigdata01 ~]$ vim .bashrc 
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

2.8 检查环境变量

1
2
3
4
5
[hadoop@bigdata01 ~]$ source .bashrc 
[hadoop@bigdata01 ~]$ which hadoop
~/app/hadoop/bin/hadoop
[hadoop@bigdata01 ~]$ echo $HADOOP_HOME
/home/hadoop/app/hadoop

2.9 查看hadoop命令帮助

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[hadoop@bigdata01 ~]$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
s3guard manage data on S3
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME

Apache Hadoop 文档:https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
Cloudera Hadoop 文档:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2/hadoop-project-dist/hadoop-common/SingleCluster.html

1.10 安装ssh

1
2
3
4
5
[hadoop@bigdata01 ~]$ cd ~
执行ssh-keygen 三次回车键 会生成.ssh文件夹
$ ssh-keygen
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

采坑:
在进行ssh登录的时候输入yes这句话是维护在know_hosts中的,遇到问题时候可以到文件中删除对应的秘钥,这个文件中存储了ssh的秘钥文件

1
2
3
4
5
6
[hadoop@bigdata01 ~]$ cd .ssh/
[hadoop@bigdata01 .ssh]$ cat known_hosts
bigdata01 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOwZK88+GuH93o6h17DEP19Ly+m79cw1rpjXTcmqlBOviTG0d8mXGmJoBDpPf/pQA49tWqgeVFcsDfBr9YdCK5w=
192.168.52.50 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOwZK88+GuH93o6h17DEP19Ly+m79cw1rpjXTcmqlBOviTG0d8mXGmJoBDpPf/pQA49tWqgeVFcsDfBr9YdCK5w=
localhost ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOwZK88+GuH93o6h17DEP19Ly+m79cw1rpjXTcmqlBOviTG0d8mXGmJoBDpPf/pQA49tWqgeVFcsDfBr9YdCK5w=
0.0.0.0 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOwZK88+GuH93o6h17DEP19Ly+m79cw1rpjXTcmqlBOviTG0d8mXGmJoBDpPf/pQA49tWqgeVFcsDfBr9YdCK5w=

2.11 格式化hdfs

1
2
[hadoop@bigdata01 ~]$ hdfs namenode -format
当出现。。。。 has been successfully formatted时候说明成功执行
1
2
3
4
5
6
7
8
9
[hadoop@bigdata01 ~]$ start-dfs.sh 
19/12/02 14:28:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [bigdata01]
bigdata01: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-namenode-bigdata01.out
bigdata01: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-datanode-bigdata01.out
Starting secondary namenodes [bigdata01]
bigdata01: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-secondarynamenode-bigdata01.out
19/12/02 14:28:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@bigdata01 ~]$
1
2
3
4
5
[hadoop@bigdata01 ~]$ jps
11536 Jps
11416 SecondaryNameNode
11258 DataNode
11131 NameNode

配置DataNode和SecondaryNameNode都以bigdata01启动
NameNode的启动是由core-site.xml中的fs.defaultFS控制的
DataNode的启动是由salves文件中机器名称控制的
SecondaryNameNode的启动是由hdfs-site.xml

1
2
3
4
5
6
7
8
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata01:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>bigdata01:50091</value>
</property>

命令帮助
hadoop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[hadoop@bigdata01 hadoop]$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
s3guard manage data on S3
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME

hadoop fs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[hadoop@bigdata01 hadoop]$ hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]

hdfs 命令:

1
2
3
4
5
6
hadoop fs -mkdir /  此处的/ 代表的是hdfs://bigdata01:9000
hadoop fs -put
hadoop fs -get
hadoop fs -cat
hadoop fs -rm
hadoop fs -ls

操作:

1
2
3
4
5
6
7
8
9
[hadoop@bigdata01 hadoop]$ hadoop fs -mkdir /hadooptest
[hadoop@bigdata01 hadoop]$ hadoop fs -ls /
drwxr-xr-x - hadoop supergroup 0 2019-12-02 12:35 /hadooptest
[hadoop@bigdata01 tmp]$ vim test.txt
[hadoop@bigdata01 tmp]$ hadoop fs -put test.txt /hadooptest
[hadoop@bigdata01 tmp]$ hadoop fs -ls /hadooptest
-rw-r--r-- 1 hadoop supergroup 42 2019-12-02 12:36 /hadooptest/test.txt
[hadoop@bigdata01 tmp]$ hadoop fs -cat /hadooptest/test.txt
hadoop hive spark flink impala kudu flume